Integration of deep learning with expectation maximization for spatial cue-based speech separation in reverberant conditions

نویسندگان

چکیده

In this paper, we formulate a blind source separation (BSS) framework, which allows integrating U-Net based deep learning network with probabilistic spatial machine expectation maximization (EM) algorithm for separating speech in reverberant conditions. Our proposed model uses pre-trained convolutional neural network, U-Net, clustering the interaural level difference (ILD) cues and phase (IPD) cues. The integrated exploits complementary strengths of two approaches to BSS: strong modeling power supervised networks ease unsupervised algorithms, whose few parameters can be estimated on as little single segment an audio mixture. results show average improvement 4.3 dB signal distortion ratio (SDR) 4.3% short time intelligibility (STOI) over EM MESSL-GS (model-based expectation–maximization localization garbage source) 4.5 SDR 8% STOI (U-Net) SONET under conditions ranging from anechoic those mostly encountered real world.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Binaural Reverberant Speech Separation Based on Deep Neural Networks

Supervised learning has exhibited great potential for speech separation in recent years. In this paper, we focus on separating target speech in reverberant conditions from binaural inputs using supervised learning. Specifically, deep neural network (DNN) is constructed to map from both spectral and spatial features to a training target. For spectral features extraction, we first convert binaura...

متن کامل

Deep Ensemble Learning for Monaural Speech Separation

Monaural speech separation is a fundamental problem in robust speech processing. Recently, deep neural network (DNN) based speech separation methods, which predict either clean speech or an ideal time-frequency mask, have demonstrated remarkable performance improvement. However, a single DNN with a given window length does not leverage contextual information sufficiently, and the differences be...

متن کامل

A Feature Study for Masking-Based Reverberant Speech Separation

Monaural speech separation in reverberant conditions is very challenging. In masking-based separation, features extracted from speech mixtures are employed to predict a time-frequency mask. Robust feature extraction is crucial for the performance of supervised speech separation in adverse acoustic environments. Using objective speech intelligibility as the metric, we investigate a wide variety ...

متن کامل

Separation of Underdetermined Reverberant Speech Mixtures by Monaural, Binaural and Statistical Cue Combination

Underdetermined reverberant speech separation is a challenging problem in source separation that has received considerable attention in both computational auditory scene analysis (CASA) and blind source separation (BSS). Recent studies suggest that, in general, the performance of frequency domain BSS methods suffer from the permutation problem across frequencies which degrades in high reverbera...

متن کامل

Expectation-maximization analysis of spatial time series

Expectation maximization (EM) is used to estimate the parameters of a Gaussian Mixture Model for spatial time series data. The method is presented as an alternative and complement to Empirical Orthogonal Function (EOF) analysis. The resulting weights, associating time points with component distributions, are used to distinguish physical regimes. The method is applied to equatorial Pacific sea s...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Applied Acoustics

سال: 2021

ISSN: ['0003-682X', '1872-910X']

DOI: https://doi.org/10.1016/j.apacoust.2021.108048